1,304 research outputs found
Recognizing Partial Biometric Patterns
Biometric recognition on partial captured targets is challenging, where only
several partial observations of objects are available for matching. In this
area, deep learning based methods are widely applied to match these partial
captured objects caused by occlusions, variations of postures or just partial
out of view in person re-identification and partial face recognition. However,
most current methods are not able to identify an individual in case that some
parts of the object are not obtainable, while the rest are specialized to
certain constrained scenarios. To this end, we propose a robust general
framework for arbitrary biometric matching scenarios without the limitations of
alignment as well as the size of inputs. We introduce a feature post-processing
step to handle the feature maps from FCN and a dictionary learning based
Spatial Feature Reconstruction (SFR) to match different sized feature maps in
this work. Moreover, the batch hard triplet loss function is applied to
optimize the model. The applicability and effectiveness of the proposed method
are demonstrated by the results from experiments on three person
re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial
person datasets (Partial REID and Partial iLIDS) and two partial face datasets
(CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is
ensured in comparison with several state-of-the-art approaches. The code is
released online and can be found on the website:
https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure
A Light CNN for Deep Face Representation with Noisy Labels
The volume of convolutional neural network (CNN) models proposed for face
recognition has been continuously growing larger to better fit large amount of
training data. When training data are obtained from internet, the labels are
likely to be ambiguous and inaccurate. This paper presents a Light CNN
framework to learn a compact embedding on the large-scale face data with
massive noisy labels. First, we introduce a variation of maxout activation,
called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different
from maxout activation that uses many feature maps to linearly approximate an
arbitrary convex activation function, MFM does so via a competitive
relationship. MFM can not only separate noisy and informative signals but also
play the role of feature selection between two feature maps. Second, three
networks are carefully designed to obtain better performance meanwhile reducing
the number of parameters and computational costs. Lastly, a semantic
bootstrapping method is proposed to make the prediction of the networks more
consistent with noisy labels. Experimental results show that the proposed
framework can utilize large-scale noisy data to learn a Light model that is
efficient in computational costs and storage spaces. The learned single network
with a 256-D representation achieves state-of-the-art results on various face
benchmarks without fine-tuning. The code is released on
https://github.com/AlfredXiangWu/LightCNN.Comment: arXiv admin note: text overlap with arXiv:1507.04844. The models are
released on https://github.com/AlfredXiangWu/LightCNN, IEEE Transactions on
Information Forensics and Security, 201
Global and Local Consistent Wavelet-domain Age Synthesis
Age synthesis is a challenging task due to the complicated and non-linear
transformation in human aging process. Aging information is usually reflected
in local facial parts, such as wrinkles at the eye corners. However, these
local facial parts contribute less in previous GAN based methods for age
synthesis. To address this issue, we propose a Wavelet-domain Global and Local
Consistent Age Generative Adversarial Network (WaveletGLCA-GAN), in which one
global specific network and three local specific networks are integrated
together to capture both global topology information and local texture details
of human faces. Different from the most existing methods that modeling age
synthesis in image-domain, we adopt wavelet transform to depict the textual
information in frequency-domain. %Moreover, to achieve accurate age generation
under the premise of preserving the identity information, age estimation
network and face verification network are employed. Moreover, five types of
losses are adopted: 1) adversarial loss aims to generate realistic wavelets; 2)
identity preserving loss aims to better preserve identity information; 3) age
preserving loss aims to enhance the accuracy of age synthesis; 4) pixel-wise
loss aims to preserve the background information of the input face; 5) the
total variation regularization aims to remove ghosting artifacts. Our method is
evaluated on three face aging datasets, including CACD2000, Morph and FG-NET.
Qualitative and quantitative experiments show the superiority of the proposed
method over other state-of-the-arts
Learning Structured Ordinal Measures for Video based Face Recognition
This paper presents a structured ordinal measure method for video-based face
recognition that simultaneously learns ordinal filters and structured ordinal
features. The problem is posed as a non-convex integer program problem that
includes two parts. The first part learns stable ordinal filters to project
video data into a large-margin ordinal space. The second seeks self-correcting
and discrete codes by balancing the projected data and a rank-one ordinal
matrix in a structured low-rank way. Unsupervised and supervised structures are
considered for the ordinal matrix. In addition, as a complement to hierarchical
structures, deep feature representations are integrated into our method to
enhance coding stability. An alternating minimization method is employed to
handle the discrete and low-rank constraints, yielding high-quality codes that
capture prior structures well. Experimental results on three commonly used face
video databases show that our method with a simple voting classifier can
achieve state-of-the-art recognition rates using fewer features and samples
A Coupled Evolutionary Network for Age Estimation
Age estimation of unknown persons is a challenging pattern analysis task due
to the lacking of training data and various aging mechanisms for different
people. Label distribution learning-based methods usually make distribution
assumptions to simplify age estimation. However, age label distributions are
often complex and difficult to be modeled in a parameter way. Inspired by the
biological evolutionary mechanism, we propose a Coupled Evolutionary Network
(CEN) with two concurrent evolutionary processes: evolutionary label
distribution learning and evolutionary slack regression. Evolutionary network
learns and refines age label distributions in an iteratively learning way.
Evolutionary label distribution learning adaptively learns and constantly
refines the age label distributions without making strong assumptions on the
distribution patterns. To further utilize the ordered and continuous
information of age labels, we accordingly propose an evolutionary slack
regression to convert the discrete age label regression into the continuous age
interval regression. Experimental results on Morph, ChaLearn15 and
MegaAge-Asian datasets show the superiority of our method
Deep Supervised Discrete Hashing
With the rapid growth of image and video data on the web, hashing has been
extensively studied for image or video search in recent years. Benefit from
recent advances in deep learning, deep hashing methods have achieved promising
results for image retrieval. However, there are some limitations of previous
deep hashing methods (e.g., the semantic information is not fully exploited).
In this paper, we develop a deep supervised discrete hashing algorithm based on
the assumption that the learned binary codes should be ideal for
classification. Both the pairwise label information and the classification
information are used to learn the hash codes within one stream framework. We
constrain the outputs of the last layer to be binary codes directly, which is
rarely investigated in deep hashing algorithm. Because of the discrete nature
of hash codes, an alternating minimization method is used to optimize the
objective function. Experimental results have shown that our method outperforms
current state-of-the-art methods on benchmark datasets.Comment: Accepted by NIPS 201
Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition
Heterogeneous face recognition (HFR) aims to match facial images acquired
from different sensing modalities with mission-critical applications in
forensics, security and commercial sectors. However, HFR is a much more
challenging problem than traditional face recognition because of large
intra-class variations of heterogeneous face images and limited training
samples of cross-modality face image pairs. This paper proposes a novel
approach namely Wasserstein CNN (convolutional neural networks, or WCNN for
short) to learn invariant features between near-infrared and visual face images
(i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with
widely available face images in visual spectrum. The high-level layer is
divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer.
The first two layers aims to learn modality-specific features and NIR-VIS
shared layer is designed to learn modality-invariant feature subspace.
Wasserstein distance is introduced into NIR-VIS shared layer to measure the
dissimilarity between heterogeneous feature distributions. So W-CNN learning
aims to achieve the minimization of Wasserstein distance between NIR
distribution and VIS distribution for invariant deep feature representation of
heterogeneous face images. To avoid the over-fitting problem on small-scale
heterogeneous face data, a correlation prior is introduced on the
fully-connected layers of WCNN network to reduce parameter space. This prior is
implemented by a low-rank constraint in an end-to-end network. The joint
formulation leads to an alternating minimization for deep feature
representation at training stage and an efficient computation for heterogeneous
data at testing stage. Extensive experiments on three challenging NIR-VIS face
recognition databases demonstrate the significant superiority of Wasserstein
CNN over state-of-the-art methods
M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis
Facial images in surveillance or mobile scenarios often have large view-point
variations in terms of pitch and yaw angles. These jointly occurred angle
variations make face recognition challenging. Current public face databases
mainly consider the case of yaw variations. In this paper, a new large-scale
Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose
Analysis (M2FPA), including face frontalization, face rotation, facial pose
estimation and pose-invariant face recognition. It contains 397,544 images of
229 subjects with yaw, pitch, attribute, illumination and accessory. M2FPA is
the most comprehensive multi-view face database for facial pose analysis.
Further, we provide an effective benchmark for face frontalization and
pose-invariant face recognition on M2FPA with several state-of-the-art methods,
including DR-GAN, TP-GAN and CAPG-GAN. We believe that the new database and
benchmark can significantly push forward the advance of facial pose analysis in
real-world applications. Moreover, a simple yet effective parsing guided
discriminator is introduced to capture the local consistency during GAN
optimization. Extensive quantitative and qualitative results on M2FPA and
Multi-PIE demonstrate the superiority of our face frontalization method.
Baseline results for both face synthesis and face recognition from
state-of-theart methods demonstrate the challenge offered by this new database.Comment: Accepted for publication at ICCV2019; The M2FPA dataset is available
at https://pp2li.github.io/M2FPA-dataset
Foreground-aware Pyramid Reconstruction for Alignment-free Occluded Person Re-identification
Re-identifying a person across multiple disjoint camera views is important
for intelligent video surveillance, smart retailing and many other
applications. However, existing person re-identification (ReID) methods are
challenged by the ubiquitous occlusion over persons and suffer from performance
degradation. This paper proposes a novel occlusion-robust and alignment-free
model for occluded person ReID and extends its application to realistic and
crowded scenarios. The proposed model first leverages the full convolution
network (FCN) and pyramid pooling to extract spatial pyramid features. Then an
alignment-free matching approach, namely Foreground-aware Pyramid
Reconstruction (FPR), is developed to accurately compute matching scores
between occluded persons, despite their different scales and sizes. FPR uses
the error from robust reconstruction over spatial pyramid features to measure
similarities between two persons. More importantly, we design an
occlusion-sensitive foreground probability generator that focuses more on clean
human body parts to refine the similarity computation with less contamination
from occlusion. The FPR is easily embedded into any end-to-end person ReID
models. The effectiveness of the proposed method is clearly demonstrated by the
experimental results (Rank-1 accuracy) on three occluded person datasets:
Partial REID (78.30\%), Partial iLIDS (68.08\%) and Occluded REID (81.00\%);
and three benchmark person datasets: Market1501 (95.42\%), DukeMTMC (88.64\%)
and CUHK03 (76.08\%)Comment: 10 pages, 7 figure
Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework
Iris segmentation and localization in non-cooperative environment is
challenging due to illumination variations, long distances, moving subjects and
limited user cooperation, etc. Traditional methods often suffer from poor
performance when confronted with iris images captured in these conditions.
Recent studies have shown that deep learning methods could achieve impressive
performance on iris segmentation task. In addition, as iris is defined as an
annular region between pupil and sclera, geometric constraints could be imposed
to help locating the iris more accurately and improve the segmentation results.
In this paper, we propose a deep multi-task learning framework, named as
IrisParseNet, to exploit the inherent correlations between pupil, iris and
sclera to boost up the performance of iris segmentation and localization in a
unified model. In particular, IrisParseNet firstly applies a Fully
Convolutional Encoder-Decoder Attention Network to simultaneously estimate
pupil center, iris segmentation mask and iris inner/outer boundary. Then, an
effective post-processing method is adopted for iris inner/outer circle
localization.To train and evaluate the proposed method, we manually label three
challenging iris datasets, namely CASIA-Iris-Distance, UBIRIS.v2, and MICHE-I,
which cover various types of noises. Extensive experiments are conducted on
these newly annotated datasets, and results show that our method outperforms
state-of-the-art methods on various benchmarks. All the ground-truth
annotations, annotation codes and evaluation protocols are publicly available
at https://github.com/xiamenwcy/IrisParseNet.Comment: 13 page
- …